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Abstract 

The canonical form of scale mixtures of multivariate skew-normal distribution is defined, 
emphasizing its role in summarizing some key properties of this class of distributions. It is 
also shown that the canonical form corresponds to an affine invariant co-ordinate system as 
defined in Tyler et al. (2009), and a method for obtaining the linear transform that converts 
a scale mixture of multivariate skew-normal distribution into a canonical form is presented. 
Related results, where the particular case of the multivariate skew t distribution is considered 
in greater detail, are the general expression of the Mardia indices of multivariate skewness 
and kurtosis and the reduction of dimensionality in calculating the mode. 
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1 Introduction 



The Gaussian model plays a central role in statistical modelling; nevertheless the need of flexible 
multivariate parametric models which are able to represent departure from normality is testified 
by the increasing weight of the literature devoted to this issues during the last decade. Departure 
from normality can take place in different ways, such as multimodality, lack in central symmetry, 
excess or negative excess of kurtosis. The present paper focuses on the last two features, con- 
sidering the class of distribution generated by scale mixtures of the d-dimensional skew-normal 
random variables defined by Azzalini and DallaValle (1996). 

The class of scale mixtures of skew-normal distributions includes parameters to regulated 
either skewness or kurtosis, and reduces to the class of scale mixture of normal distributions 
when the skewness parameter vanishes. Finally, the skew-normal distribution is recovered when 
the mixing distribution corresponds to a random variable that is equal to one with probability 1. 
Among the members of this family, whose general form has been firstly introduced by Branco 
and Dey (2001), the skew t distribution is the one that has received the greatest attention; it cor- 
responds to the case where the mixing distribution is W~ x l 2 , where W is a Gamma(v /2, v/2) 
random variable. Azzalini and Capitanio (2003) developed a systematic study of its main prob- 
abilistic properties as well as statistical issues, however some aspects have been left unexplored, 
like the expression of suitable indices of multivariate skewness and kurtosis and a formal proof 
of unimodality. The usefulness of the skew t distribution has been explored in different applied 
problems. Azzalini and Genton (2008) proposed and discussed the use of the multivariate skew 
t distribution as an attractive alternative to the classic robustness approach, and Walls (2005), 
Meucci (2006) and Adcock (2009), among others, adopted this model to represent relevant fea- 
tures of financial data. Another member which has been studied in some details is the multivariate 
skew-slash distribution, defined by Wang and Genton (2006), which is obtained when the mix- 
ing distribution is U~ l l q , where U is a uniform distribution on the interval (0, 1) and q is a real 
parameter greater than zero. 

This paper introduces the definition of a canonical form associated to scale mixtures of 
skew normal distribution, which generalizes the analogous one introduced in Azzalini and Cap- 
itanio (1999) for the multivariate skew-normal distribution. The motivation is its suitability in 
allowing a simplified representation of some relevant features which are shared by all the mem- 
bers of the class of scale mixtures of skew-normal distributions. In fact the components of the 
canonical form are such that all but one is symmetric: the skewed component summarizes the 
skewness of the distribution as a whole, leading to consistent simplifications in obtaining sum- 
mary measures of the data shape. For instance, compact general expressions for the indices of 
multivariate skewness and kurtosis defined by Mardia (1970, 1974) for the entire class of scale 
mixtures of skew-normal distributions are obtained. It will be also shown that a data transform- 
ation leading to a canonical form generates an affine invariant co-ordinate system of the kind 
defined and discussed in Tyler et al. (2009) in connection with a general method for exploring 
multivariate data. 

2 The skew-normal distribution and its canonical form 

The multivariate skew-normal distribution has been defined in Azzalini and Dalla Valle (1996). 
The parameterization adopted in the present paper is the one introduced by Azzalini and Capit- 
anio (1999), that have further explored the properties of this family. 
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A d-dimensional variate Z is said to have a skew-normal distribution if its density function 

is 

f{z) = 2 ( t> d (z-^)^(a T io- 1 {z-i)) (zeR d ), (1) 

where 4>d{z; Q) denotes the ci-dimensional normal density with zero mean and full rank covari- 
ance matrix Q, $ is the iV(0, 1) distribution function, £ € M d is the location parameter, u is a 
diagonal matrix of scale parameters such that £1 = cj _1 f2ci; _1 is a correlation matrix, and a € M. d 
is a shape parameter which regulates departure from symmetry. Note that when a = the nor- 
mal density is recovered. A random variable with density (Q]) will be denoted by SNrf(£, Q, a). 
The skew-normal distribution shares many properties with the normal family, such as closure 
under marginalization and affine transforms, and \ 2 distribution of certain quadratic forms. See 
Azzalini and Capitanio (1999) for details on these issues. For later use we recall that the mean 
vector and the covariance matrix of Z are 

/2\ 1/2 2 rT 
fj, = £ + — ojS and £ = O uoo u), (2) 

V 7T / 7T 



where 



Qa (3) 



is a vector whose elements lie in the interval (—1,1). From (f3]> we have also 

It is important to note that the shape parameter of a marginal component of Z is in general 
not equal to the corresponding component of a. More specifically, when Z is partitioned as 
Z = (Z\, Z2) 1 of dimension h and d — h, respectively, the expression of the shape parameter of 
the marginal component Z\ is given by 

ai ~ (l + aj^.^y/^ 

where ^221 = ^22 — ^2i(^n) _1 ^i2> and Clij and aj, for i,j = 1,2, denotes the elements of 
the corresponding partitions of £1 and a, respectively. On the contrary, the entries of the vector 
5 after marginalization are obtained by extracting the corresponding components of the original 
parameter. 

Azzalini and Capitanio (1999, Proposition 4) introduced a canonical form associated to a 
skew-normal variate, via the following result. 

Proposition 1 Let Z ~ SN^^Vt^a) and consider the affine non singular transform Z* = 
(C^Pyuj-^Z - £) where C T C = U and P is an orthogonal matrix having the first column 
proportional to Ca. Then Z* ~ SN(0, Id, olz*), where az* = (a*, 0, . . . , 0) T and a* = 
(a T fta) 1/2 . 

The above authors called the variate Z* a canonical form of Z. With respect to the original 
definition, and without loss of generality, here it is assumed that the non-zero element of the 
shape vector a is the first one. The above result can be easily verified by applying Proposition 3 
of Azzalini and Capitanio (1999). Furthermore, using their Propositions 5 and 6 it is immediate 
to see that Z\ ~ SN\ (0, 1, a*) while the remaining components of Z* are Ni(0, 1) variates, and 
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that in addition the components of Z* are mutually independent. Finally, it is remarked that the 
linear transform leading to a canonical form is not unique. 

Azzalini and Capitanio (1999) underlined how this transformation plays a role analogous 
to the one which converts a multivariate normal variable into a spherical form. Motivated by 
the expressions they obtained for the indices of multivariate skewness and kurtosis defined by 
Mardia (1970), they also highlighted the role of a* as a quantity summarizing the shape of the 
distribution. In fact the two indices are 



and they depends on a and f2 only via a*. 

As an additional comment, note that by comparing expressions Q and © with the corres- 
ponding ones for a univariate skew-normal distribution (see Azzalini 1985, sect. 2.3), and taking 
into account (O, it turns out that the values of the multivariate skewness and kurtosis indices 
are equal to those of the corresponding univariate indices for a skew-normal distribution having 
shape parameter equal to a*. In this sense, the canonical form is characterized by one component 
absorbing the departure from normality of the whole distribution. 

Notice also that on using expression ([3]>, the marginal shape parameter 5 associated to the 
skewed component of a canonical form turns out to be <5* = (<5 T U^S) 1 ' 2 . Because of the one- 
to-one correspondence between these two quantities, it makes no difference which one is used as 
summary quantity. 

Some results contained in Tyler et al. (2009) allows to provide new insight into the role 
of a canonical transformation Z*. These authors introduced a general method for exploring mul- 
tivariate data, based on a particular invariant co-ordinate system, which relies on the eigenvalue- 
eigenvector decomposition of one scatter matrix relative to another. The canonical transforma- 
tion Z* turns out to be an invariant co-ordinate system transformation with respect to the scatter 
matrices Q and E, and taking into account the results of Section 3 of Tyler et al. (2009), a method 
to obtain a matrix H such that Z* = H T (Z — £) can be explicitly stated. 

Proposition 2 Let Z ~ SNd(C, ^, ot), and define M = n^^EO -1 / 2 , where Q}l 2 is the unique 
positive definite symmetric square root of O, and E is the covariance matrix of Z. Let QAQ T 
denote the spectral decomposition of M. Then the transform 



invariant co-ordinate system transformation based on the simultaneous diagonalization of the 
scatter matrices and E. 

Proof. Consider the simultaneous diagonalization of the scatter matrices = E[(Z — £) 2 ] and E, 
and let f2 -1 / 2 denote the unique positive definite symmetric square root of f2. Following Tyler et 
al. (2009, Section 3), a matrix H such that H T ttH = I d and H T T,H = diag(Ai, . . . , A rf ) turns 
out to be Vi^^Q, where Ai < A2 < • • • < are the eigenvalues of _1 E, or equivalently of 
M = fi~ 1 / 2 EO -1 / 2 , and where the ith column of the d x d orthogonal matrix Q is the normalized 
eigenvector of M corresponding to the ith smallest eigenvalue. Furthermore, the ith column of H 




(5) 



(6) 




is an 
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is the eigenvector of Q~ £ corresponding to the ith smallest eigenvalue of Q~ E. The transform 
H T Z corresponds to an invariant co-ordinate system, as defined in Tyler et al. (2009, p. 558). 
After some straightforward algebra, the eigenvalues of f^ 1 !! turn out to be 1, with multiplicity 
d — 1, and 1 — (2/ir)5%, and the associated eigenspaces are the orthogonal complement of the 
subspace spanned by oj8, and the subspace spanned by u~ 1 a, respectively. This fact implies 
that the first row of H~ x is proportional to loS, while the last d — 1 rows lie in the orthogonal 
complement of the subspace spanned by w -1 <x On using the expressions for the parameters of 
a linear transformation of a skew-normal variate given in Azzalini and Capitanio (1999, p. 585), 
the distribution of Z* = H T (Z—^) is SN(0, 1^, taking into account the structure of 

the matrix H~ l and the equality H~ 1 £l~ 1 (H~ 1 ) T = I d we obtain H^uj^a = (a*, 0, . . . , 0) T , 
and hence the variate Z* corresponds to the canonical form of Z. QED 

The proof of Proposition |2] contains a description of the structure of the matrix H , which 
it is shown to have one column proportional to uj~ l a and the remaining ones belonging to the 
orthogonal complement of the subspace spanned by u5. This result implies that the projection 
a 1 uj~ l Z captures all the skewness and the kurtosis of the joint distribution, whereas by pro- 
jecting Z onto the orthogonal complement of the subspace spanned by ui5 independent N(0, 1) 
variates are obtained. 

Since a matrix H converting a skew-normal variate to its canonical form can be obtained 
through the simultaneous diagonalization of a pair of scatter matrices different from Q and S, it is 
expected that when two scatter matrices, V\ and V2 say, are such that they become diagonal when 
the variate Z is in canonical form, then Proposition [2] will continue to be valid if the matrices Q 
and S are replaced by V\ and V2. An example of such matrices will be given in Proposition [6] at 
the end of Section 01 

3 Scale mixtures of skew-normal variates and their canonical form 

In this section a canonical form analogous to the one introduced for the skew-normal distribution 
is defined for scale mixtures of skew-normal distributions, and some properties are given. 

3.1 Scale mixtures of skew-normal variates 

Scale mixtures of skew-normal distributions have been considered in Branco and Dey (2001). 
This class of distributions contains the corresponding class of scale mixture of normal distribution 
and the skew-normal distribution as proper members, allowing to model a wide range of shapes. 
A scale mixture of skew-normal distributions is defined as follows. 

Definition 1 Let Y = £ + ujSZ, where Z ~ SN^(0, 0, a) and S > is an independent scalar 
random variable. Then the variate Y is a scale mixture of skew -normal distributions, with loca- 
tion and scale parameters £ and uj, respectively. 

Note that, when a = 0, Y reduces to the corresponding scale mixture of Nd(0, Cl) distributions. 

The mth order moments of Y can be calculated by differentiating the moment generat- 
ing function given in Branco and Dey (2001, expression 4.1). An alternative and simpler way 
to obtain moments is to follow the scheme used by Azzalini and Capitanio (2003, expression 
(28)) for the moments of the skew t distribution, which arises when S = W' 1 / 2 , and W is a 
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Gamma^u, ^u) random variable. Specifically, assuming that £ = and oj = by exploiting 
the stochastic representation given in Proposition \T\ we obtain 



E(y(™>) = E{S m )E(Z^), (7) 

where y( m ) denotes a moment of order m. Note that to use this formula only the knowledge of 
mth order moments of S and Z is required. 

An appealing property of an SNd(0,Q,a) variate is that the distribution of its any even 
functions is equal to the one obtained by applying the same even function to a A^(0, $7) variate. 
This fact can be easily seen by considering Proposition 2 in Azzalini and Capitanio (2003) and 
noting that the skew-normal distribution belongs to the broader class of distribution generated by 
perturbation of symmetry which the proposition is concerned with. As a corollary it follows from 
(0) that even order moments of Y — £ are equal to those of the corresponding scale mixture of 
normal distributions. On using (Q and taking into account Q, the mean vector and the covariance 
matrix of Y are 

E(Y) =£ + E(S)J-5 and var{Y) = E{S 2 )Q - E{S) 2 -56 T , (8) 

V 7T TT 

in agreement with those obtained by Branco and Dey (2001). 

Scale mixtures of skew-normal are models capable to take into account for both skewness 
and kurtosis, and it is important to have available the expressions of measures of these two 
features. The next proposition introduces the expression of the Pearson indices of skewness and 
kurtosis for the univariate case; the multivariate case will be considered later, as the introduction 
of the canonical form of Y allows to cope with the problem in a simpler manner. 



Proposition 3 Let Y = £ + ujSZ, where Z ~ SNi(0, l,a) and S > is a scalar random 
variable. Then, provided that the moments up to order three or up to order four of S exist, the 
expressions of the skewness and excess of kurtosis indices 71 and 72 are 



71 



72 



1/2 




E{Sf- -E(S 3 



TT 



8 3 + 



[E{S)E(S 2 ) - E(S 3 )]5, and 



E(S)E(S 3 ) - -E(sy 



TT 



5 4 + 



24 



[E{S)E(S 3 ) - E(S 2 )E(S) 2 ] 6 2 + 3 [E(S 4 ) - E{S 



2\21 



TT 



where o\ = var(Y). 



Proof. Since the two indices are location and scale invariant, the case were £ = and oj = 1 
will be considered. The third and the fourth cumulants of Y required to compute 71 and 72 
are functions of the first four non central moments of Y, which in turn, taking into account (0, 
depends on the corresponding moments of Z. The first moment of Z is given in ©, and taking 
into account that Z 2 ~ xi ( see Azzalini 1985, property H) the second and the fourth ones are 
equal to 1 and 3, respectively. Finally, by deriving the moment generating function of the scalar 
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skew-normal distribution given in Azzalini (1985, p. 174), the third moment of Z turns out to be 
3(2/vr) 1 / 2 5 - (2/vr) 1 / 2 (5 3 . After some algebra the result follows. QED 

Note that when 5 = the variate Y is a scale mixture of iV(0, 1), so that the index 71 
becomes zero and 72 = ay 4 3 [E(S 4 ) — E(S 2 ) 2 ] measures the excess of kurtosis of Y. When 
S is degenerate and S = 1, the expressions of the two indices for the skew-normal distribution 
are recovered. When S is the inverse of the square root of a Gamma(^v, \ v) random variable, 
Y follows a scalar skew t distribution, and the two indices coincide with those given in Azzalini 
and Capitanio (2003, p. 382). 

3.2 The canonical form of scale mixtures of skew-normal distributions 

The canonical form for scale mixtures of skew-normal distributions is defined in the following 
way. 

Definition 2 Let Y = £ + ujSZ, where Z ~ SN^(0, O, a) and S > is an independent scalar 
random variable. The variate Y* = (C~ 1 P) t uj~ 1 (Y — £) = SZ*, where the matrices P and C 
are as in Proposition^ will he called a canonical form ofY. 

From the above definition, it is straightforward to see that Proposition |2]can be extended to scale 
mixtures of skew-normal variates, that is, the linear transform Y* = H T (Y — £), where H is 
defined as in Proposition [2l converts Y into a canonical form. 

The next proposition states some properties of Y*. 
Proposition 4 Under the settings of Definition^ the following facts hold. 

(i) Only the first univariate component ofY* can be skewed. More specifically, Yf is a scale 
mixture of an SN\(0, 1, a*) variate, where a* = {a T £la) 1 / 2 , and its mean and variance 
are 

M* = E(S)(2MV 2 5„ al = E(S 2 ) - (2/7r)E(S) 2 5l 

respectively, where 5* = (5 T Cl^ 1 5) 1 ^ 2 . The remaining components are identically dis- 
tributed scale mixtures of Nx (0,1) distributions, that is, symmetric about zero random 
variables with variance a 2 = E(S 2 ). 

(ii) The d components ofY* are uncorrelated. 

(Hi) The non zero elements of the set of moments E(Y*^) are 

E(Y* 3 ) = E(S 3 )(2/n) 1 / 2 5,(3 - 6 2 ) 

and 

E(Y{Y* 2 ) = E(S 3 ) 72M*, i = 2,...,d, 
(iv) The non zero elements of the set of moments E(Y*^) are 

E(Y* 4 ) = 3E(S 4 ), (i = l,...,d), 
E(Y * 2 Y* 2 ) = E(S% (j = l,...,d,i^j). 
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Proof, (i) By definition Y* = SZ* ; the result follows taking into account that Z\ ~ SN\ (0, 1 , a* ) 
whilst the last d — 1 components of are iV(0, 1). The expressions for the means and the vari- 
ances can be obtained by © taking into account (O. (ii) Using (O the vector 5 associated to 
Y* becomes (5*, 0, ... , 0) T , where <5* = (<5 T ^~ 1 (5) 1 / 2 ; taking into account the expression of 
var(Y) given in ([8]), we see that Cov(Y* ,Y*) = 0. (iii)-(iv) From expression © we have 
£(y*M) = E(S m )E(Z*^). The result follows taking into account that the components of 
Z* are mutually independent and the expressions of their moments. QED 

The above results show that the main features of the canonical form of the skew-normal 
distribution are preserved when a scale mixture is considered. In fact only the first component 
is skewed, and the influence of the parameters Q and a is completely summarized by quantity 
a*, or equivalently by 5*. Independence among the components is replaced by a zero correla- 
tion, as expected since scale mixture of normal distribution themselves does not allow to model 
independence between components. 



4 Mardia indices of multivariate skewness and kurtosis 

The canonical form of Y can lead to dramatic simplification in calculating quantities which are 
invariant or equivariant with respect to invertible affine transformations. This is the case, for 
instance, of the Mardia indices of multivariate skewness and kurtosis and of the mode. In this 
section the Mardia indices will be considered, while the latter issue will be developed in the next 
section. 

Given a d-dimensional random variable Y, the Mardia indices of multivariate skewness 
and excess of kurtosis are defined as follows 

7i,d = = y] a%% & kk ^i,j,k^i'j'k', 

ijk i'j'k' 

124 = fo,d-d(d + 2) = E U{Y - fifX-^Y - v)] 2 \ -d(d + 2), 

where fi and S denote the mean vector and the covariance matrix of Y, respectively, fiij^ = 
E [(Yi — Hi)(Yj — Hj)(Xk — Mfc)]> an d denotes the (i, i')th entry of S -1 . 



Proposition 5 Consider the scale mixture of skew-normal distribution Y = £ + uSZ, where 
Z ~ SNrf(0, Cl, a). Then the Mardia indices of multivariate skewness and excess of kurtosis of 
Y are, provided that the involved moments of S exist 

7 M = (7i*) 2 + [E(S 3 ) - E(S)E^)] 2 \ 2 , 

72,d = Pl + {d-l){d + l)E(S 2 )- 2 E(S 4 ) + 

+^S^\ E {S A ) + [E(S) 2 E(S 2 ) - 2E(S)E(S 3 )]^ - d(d + 2), 

where, using a self explanatory notation, the quantities 7*, 6* and a 2 refer to the component 
Y* of the canonical form associated to Y. 

Proof. In the proof some symbols introduced in Proposition [4] will be used. Since 71^ and 72^ are 
invariant with respect to invertible affine transforms, the canonical form Y* will be considered 
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in place of Y. From (i) of Proposition @] we know that the last d — 1 components of Y* are 
symmetric about zero; a first implication is that fii^k = for any 2 < k < d. In addition, taking 
into account (to), it follows that /ijj fc = for any choice of i, j and A; in {2, ... , <i}. From (ii) 
we have cr JJ = for any j ^ j', and consequently 71^ reduces to 

(^i,i,i) 2 , 3 2 

* * i=2 

Finally, by expressing /ii,i,i in terms of non central moments and by applying ([7]), the first equality 
is proved. 

Let us denote by ^i,j t k,l the generic entry of the fourth order central moment of Y*; taking 
into account (i) and (ii) of Proposition [4] we have 



P24 = E 



a'i ^ rr- 

* i=2 



d d—1 d d 

Ml, 1,1,1 + ^ /fi^M + 2 E E ^''' J ' J +2^^" 



<t j * — ' * — ' a" 1 < — ' a 2 a 2 



=2 0= 

where the expressions of fii^i and fH,i,j,j for i and j greater than 1 are given in (iv) of Propos- 
ition 01 and that of Mi can be obtained with the aid of (Q. After some algebra the second 
equality follows. QED 

This result shows that, if Y is a scale mixture of skew-normal distributions, then 7^ and 
j2,d depend on the shape of S, and on the underlying skew-normal variate only via the scalar 
quantity a*, or equivalently 5*, reinforcing its role of a summary quantity of the distribution 
shape. 

By comparing these expressions with the corresponding ones of the skew-normal distribu- 
tion, given by (f5]) and (O, respectively, we can observe that they have a different structure. In 
particular, when a scale mixture of skew-normal distributions is considered, the two indices do 
not coincide with their univariate version evaluated with respect to the marginal distribution of 
the only skewed component of the variate in canonical form. 

It could be of interest to highlight the structure of fa d = l2,d + d(d + 2). It turns out 
that it is the sum of three terms: the univariate kurtosis index of Y*, whose expression is given 
in Proposition [3j the kurtosis index P24-1 of the (d — 1) -dimensional scale mixture of normal 
distribution (Y 2 *, Y*) T , which is given by (d - l)(d + l)E(S 2 )~ 2 E(S 4 ), and a term which 
is related with the fourth moment of Y* through fii.i,i,i, for any i G {2, . . . , d}. 

When Y ~ ST<i(£, Q,a,u) explicit expressions of the two indices can be easily obtained 
taking into account the well known result 

(u/2) m / 2 T((u-m)/2) 



E(S 



m/2\ 



I>/2) 



leading to 



7M = (7i) 2 + 3(rf-l) , 2 , if ^ > 3, 
I24 = P 2 + (a - I)— H 2" 



if v > 4 



- ],>1 -d(d + 2), 
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where 

/I/Nl/2 r((|/-l)/2) 2 * 2 

and the explicit expressions of 7^ and 72 = /3| — 3 are given in Azzalini and Capitanio (2003, 
p. 382). 

Note that an equivalent expression, obtained through a different method, for P24 = 72,d + 
d(d + 2), is given in Kim and Mallik (2009). Finally, note also that the expression of 71^ and 
72,d given in Proposition [5] reduces to the corresponding ones for the skew-normal distribution 
when S is such that pr(5 = 1) = 1, while j2,d * s tne index of multivariate kurtosis of a scale 
mixture of normal distributions with mixing variable S when 5* = 0. 

The following proposition provides a further example of a pair of scatter matrices that can 
be used for obtaining the linear transform to convert a scale mixture of skew-normal variates into 
a canonical form. The proof of the proposition contains the proof of the fact that if two scatter 
matrices are diagonal when the considered variate is in canonical form, then it is expected that 
by applying to them the procedure described in Proposition [2] we obtain a matrix H that induces 
a canonical form. 



Proposition 6 Consider the scale mixture of skew-normal distribution Y 
Z ~ SNrf(0, Cl, a), and define the scatter matrix 



£ + uSZ, where 



K, = E 



Let M' = E _1 / 2 /CE -1 / 2 , where E 1 / 2 is the unique positive definite symmetric square root of 
E, and E is the covariance matrix ofY. Let Q'A'Q' T denote the spectral decomposition of M'. 
Then the transform 

Y* = H T (Y — £), 
where H = Y,~ l l 2 Q', converts Y into a canonical form. 



Proof. By means of the results contained in Proposition [4] it is possible to show that when a 
scale mixture of skew-normal distribution is in canonical form, then both the scatter matrices 
K, and E are diagonal. Let K* = H T )CH and E* = H T T,H T denote such matrices, where 
H is a matrix such that H T (Y — £) is in canonical form. The equality M'q'j = ^'jQj, where 
q'j is the j-th column of the matrix Q' and X'j = A'- ■ is the corresponding eigenvalue, implies 

that the equality T^^ICH- 1 ^- 1 ! 2 ^) = A^ _1 (E -1 / 2 ^) must also hold true; since both 
K.* and E* are diagonal, the equality is fulfilled when all the eigenvalues of M' are equal, or 
when H~ 1 (Y,~ l l 2 Q') <x 1^. The first circumstance is out of interest, because it would imply that 
we are considering two scatter matrices which are proportional, the second one implies that the 
columns of Yr x l 2 Q' are proportional to the corresponding columns of H, and the proposition is 
proved. QED 

On the basis of Propositions |2] and [6] we see that the matrix H that defines the canonical 
form can be obtained working with the pair (Q, E) or with (E, JC), no matter which one between 
them. However it is important to highlight the auxiliary information given by this technique, 
which essentially relies on a spectral decomposition. In particular, it is straightforward to note 
that the trace of the matrix i7 _1 E, or equivalently, of M, is equal to the sum of the variances of 
the marginal univariate components of the canonical form, while the trace of the matrix E _1 /C, 
or equivalently, of M' , is equal to ^24- 
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5 The mode of the multivariate skew-normal and skew t distribu- 
tions 



The mode of the skew-normal and skew t distributions cannot be calculated in closed form, so 
one needs to resort to numerical methods. In this section it is proved the uniqueness of the mode 
in the <i-dimensional case, and it is shown that its computation can be reduced to an equivalent 
one-dimensional problem, drastically reducing the dimensionality of the original problem. From 
the expression of the mode which is obtained, it also turns out that the mode, the mean and the 
location parameter are aligned. More specifically, they lie in a one dimensional linear manifold 
of direction ojS. Thus, the departure from symmetry of these distributions is characterized by a 
displacement of the probability mass along this direction. The above issues are briefly discussed 
also for the general case of scale mixture of skew-normal distributions. 

For later use, we recall that the density function of a d-dimensional skew t variate as given 
by Azzalini and Capitanio (2003, expression 26) is 

My) = 2 t d (y - £; m ^^(y - o (^^) ^ \» + <* j (v e » d )> (9) 

where Q y = (y — £ : ) T ^~ 1 (y — £), td(x; v) is the density function of a d-dimensional i-variate 
with v degrees of freedom, T\{x\ v + d) is the scalar t distribution function with v + d degrees 
of freedom. A random variable having density (|9]) will be denoted by ST<f(£, 0, a, u). 

Proposition 7 Let Z ~ SNd(£, ^, a). Then the unique mode of Z is 

M = i + uQa = £ + 
where m,Q is the mode of a scalar SN\(0, 1,0*) random variable. 

Proof. Consider first the mode of the canonical form Z* ~ SN^{Q, 1^, az*)- If we calculate the 
mode by imposing the gradient of the density function to be equal to the null vector, the system 
of equations to be solved turns out to be 

zi$(a*zi) - 4>\{ol*z\)oi* = 
z 2 ^{a 1f zi) = 

Zd^ia^zi) = 0, 

where z%, i = 1, 2, . . . , d denotes the ith entry of the vector z £ M d . The last d — 1 equations are 
satisfied when z% = for i = 2, . . . , d, whilst the unique root (for the uniqueness see Azzalini, 
1985, Property D) of the first one corresponds to the mode, say ttiq, of a 5iVi(0, 1, a*), so that 
the mode of Z* is the vector Mq = (ttiq, 0, . . . , 0) T = (mQ/a*)aJ». Recalling that Z = 
£ + wC T PZ* and a* z = P T Ca, and taking into account that the mode is equivariant with 
respect to affine transformations, the mode of Z turns out to be 

M = £ + < uC T PP T Ca = £ + ^uQa = £ + ^e>8, 
where the last equality follows taking into account (O and (0]). QED 
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Proposition 8 Let Y ~ STd(£, a, v). Then the unique mode ofY is 

M 



where Dq € R w ?/ze unique solution of the equation 

y{v + d) 1 / 2 ^ (u;(y); 1/ + d) - h(w(y);v + d)va,{v + y 2 )' 1 ' 2 
i j \ 1/2 



w/zere = a*y 



Proof. As for the skew-normal case, the canonical form Y* ~ 5T^(0, J^, ay* , v), where ay* = 
(a*, 0, . . . , 0) T is considered, and the mode is calculated by imposing the gradient of the density 
function to be equal to the null vector. The system of equations to solve turns out to be 



x\T\{a*x\c(x)\ v + d) 



ti{at*xxc(x)\ v + d) 



(u + x T x y/z(v + d)v^ + x x Xl)a * 



x 2 



Ti(a*xic(x); v + d) + 



t\(a*x\c(x);v + d) 
(u + x T x) 1 / 2 (u + d) 1 /' 



r^ia* 



T\(a.*x\c{x); v + d) + 



ti(a*xic(x); v + d) 



r^ia* 



0, 



u + x T x) 1 / 2 (u + d) 1 / 2 ' 

where x = (x\,X2, ■ ■ ■ , Xd) T and c(x) = {(v + d)/(y + x T x)} 1 / 2 . First note that the function 
on the left hand side of the first equation can be equal to zero only if x\ > 0. This fact implies 
that the remaining equations are equal to zero if and only if X{ = for i = 2, . . . , d. Hence the 
mode of Y* is Mq = (t/q, 0, . . . , 0) T , where the scalar value yfi > is the solution of 

ti(w{y);v + d) 



yT 1 {w{y);u + d) 



( I/ + y 2)l/2( 1/ + d )l/2 



VOL* 



0. 



(10) 



where w(y) = a*y 



u + d 
v + y 2 



1/2 



. To see that equation (flOl admits a unique solution, first notice 



that when y^ > the function on the right hand side is the difference between a strictly increasing 
function and a strictly decreasing one. Furthermore, when y£ = the latter is greater than zero 
while the former is equal to zero, and as yjj — > oo the latter goes to zero while the former goes to 
oo. Hence, there exists a unique point in which their difference is equal to zero. The expression 
of the mode of Y is obtained on the basis of arguments analogous to those used for the mode of 
a multivariate skew-normal distribution. QED 

Note that a different proof for the uniqueness of the mode for the multivariate skew t 
distribution has been independently developed by Azzalini and Regoli (2012). 

The issue of finding the mode of other members of the family of scale mixture of skew- 
normal distributions can be tackled in a similar way. An open problem, which is not investigated 
here, is to assess the uniqueness of the solution. 

It is straightforward to see that if a point of M. d is the mode of the canonical form of a d- 
dimensional skew scale mixture of skew-normal variates, then it should be of type (t/q , 0, . . . , 0) T , 
where the real number y$ is such that 



-d-l 



a*c 



0. 
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where fs(s) denotes the density function of S. This implies that, as for the skew-normal and 
skew t distributions, the mode of a scale mixture of skew-normal distributions will be of the 
form 
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